1
00:00:11,660 --> 00:00:08,900
yeah my name is Kenny I know it's really

2
00:00:12,890 --> 00:00:11,670
difficult sorry and I guess I'm one of

3
00:00:15,619 --> 00:00:12,900
the people that you don't envy because

4
00:00:20,150 --> 00:00:15,629
I'm going to try to talk about amino

5
00:00:21,769 --> 00:00:20,160
acid biosynthetic pathways and so amino

6
00:00:23,810 --> 00:00:21,779
acids are critical and that they are the

7
00:00:25,910 --> 00:00:23,820
constituents of the proteins which carry

8
00:00:27,320 --> 00:00:25,920
out the functions in a cell and it's

9
00:00:29,810 --> 00:00:27,330
thought that the last Universal common

10
00:00:33,110 --> 00:00:29,820
ancestor or the root of the tree of life

11
00:00:36,560 --> 00:00:33,120
is able to prevail to produce all

12
00:00:38,239 --> 00:00:36,570
currently available amino acids so I'm

13
00:00:40,569 --> 00:00:38,249
interested in characterizing things at

14
00:00:42,500 --> 00:00:40,579
the last Universal common ancestor and

15
00:00:44,059 --> 00:00:42,510
obviously in the context of the

16
00:00:50,000 --> 00:00:44,069
progression of the central dogma or

17
00:00:52,630 --> 00:00:50,010
these other molecular processes so uh so

18
00:00:55,610 --> 00:00:52,640
the first question that I had with this

19
00:00:58,399 --> 00:00:55,620
project is whether the conservation of

20
00:01:00,919 --> 00:00:58,409
proteins associated with amino acid

21
00:01:02,959 --> 00:01:00,929
biosynthesis would be related to a

22
00:01:08,750 --> 00:01:02,969
characteristic of the amino acid like

23
00:01:10,940 --> 00:01:08,760
the complexity of them so I I have a by

24
00:01:14,900 --> 00:01:10,950
automatic approach where I compare

25
00:01:18,260 --> 00:01:14,910
features across species or the whole

26
00:01:20,900 --> 00:01:18,270
tree of life based on either the gene or

27
00:01:23,240 --> 00:01:20,910
protein level and extrapolate this

28
00:01:28,040 --> 00:01:23,250
information on to the last Universal

29
00:01:30,740 --> 00:01:28,050
common ancestor so a super cool a

30
00:01:32,240 --> 00:01:30,750
postdoc in the lab Aaron who seems to be

31
00:01:35,390 --> 00:01:32,250
here in spirit is he got a shout out

32
00:01:38,930 --> 00:01:35,400
earlier to curated this database called

33
00:01:40,910 --> 00:01:38,940
Luca pedia which basically is a

34
00:01:42,380 --> 00:01:40,920
combination of several different data

35
00:01:44,330 --> 00:01:42,390
sets that attempt to characterize

36
00:01:46,160 --> 00:01:44,340
ancient characteristics of proteins

37
00:01:49,430 --> 00:01:46,170
based on these universal type

38
00:01:53,360 --> 00:01:49,440
distributions and they use different

39
00:01:55,100 --> 00:01:53,370
approaches so there's different types of

40
00:01:58,030 --> 00:01:55,110
ways you can approach this obviously you

41
00:02:01,240 --> 00:01:58,040
can do it by gene families by

42
00:02:04,070 --> 00:02:01,250
universally distributed protein motifs

43
00:02:06,500 --> 00:02:04,080
some of these studies use different

44
00:02:09,080 --> 00:02:06,510
structural perspectives as in protein

45
00:02:12,500 --> 00:02:09,090
folds or groups of protein fold

46
00:02:17,059 --> 00:02:12,510
functions and common reactions across

47
00:02:19,370 --> 00:02:17,069
the domains of life so in terms of

48
00:02:24,320 --> 00:02:19,380
complexity one of the reference

49
00:02:26,150 --> 00:02:24,330
I'm using his proxy is the abundance of

50
00:02:28,310 --> 00:02:26,160
certain prebiotic synthesis experiments

51
00:02:30,830 --> 00:02:28,320
so here I just have a table of the

52
00:02:32,450 --> 00:02:30,840
genetic code where I have the codon on

53
00:02:36,020 --> 00:02:32,460
the left and the corresponding amino

54
00:02:37,640 --> 00:02:36,030
acid on the right and the shading just

55
00:02:41,540 --> 00:02:37,650
corresponds with higher abundance in

56
00:02:44,930 --> 00:02:41,550
these experiments so I use a pretty

57
00:02:46,760 --> 00:02:44,940
simple diaphragmatic pipeline I start

58
00:02:48,950 --> 00:02:46,770
with a set of amino acid related

59
00:02:50,210 --> 00:02:48,960
proteins and I match these with things

60
00:02:53,690 --> 00:02:50,220
that I found in at least three of the

61
00:02:56,120 --> 00:02:53,700
studies in the database and I couple

62
00:02:57,650 --> 00:02:56,130
that with some ID matching which i use

63
00:03:01,130 --> 00:02:57,660
the ends on commission code for these

64
00:03:03,290 --> 00:03:01,140
proteins which is just a annotation for

65
00:03:05,690 --> 00:03:03,300
classes of enzyme function where you

66
00:03:08,120 --> 00:03:05,700
have four digits in each digit confers

67
00:03:09,920 --> 00:03:08,130
more specificity and description of the

68
00:03:11,780 --> 00:03:09,930
function of the enzyme and I did this

69
00:03:13,880 --> 00:03:11,790
within without the last digit which

70
00:03:17,450 --> 00:03:13,890
usually describes some substrate

71
00:03:20,300 --> 00:03:17,460
specificity of the enzyme so then I've

72
00:03:23,240 --> 00:03:20,310
mapped these things onto whoa then I met

73
00:03:24,920 --> 00:03:23,250
these things onto specific pathways to

74
00:03:30,320 --> 00:03:24,930
see where things were conserved in pasay

75
00:03:32,150 --> 00:03:30,330
puces so here's a one really cool

76
00:03:34,460 --> 00:03:32,160
pathway and what you're looking at in

77
00:03:36,410 --> 00:03:34,470
red throughout the pathway of the enzyme

78
00:03:39,380 --> 00:03:36,420
the three digit enzyme Commission codes

79
00:03:40,730 --> 00:03:39,390
that are conserved or match three at

80
00:03:43,820 --> 00:03:40,740
least three of these papers and the data

81
00:03:45,950 --> 00:03:43,830
set in the database and it looks a

82
00:03:48,199 --> 00:03:45,960
little daunting at first but if I peel

83
00:03:50,240 --> 00:03:48,209
off things that are not conserved you

84
00:03:52,850 --> 00:03:50,250
can see that there are these functional

85
00:03:56,390 --> 00:03:52,860
cores that seem to be conserved

86
00:03:58,250 --> 00:03:56,400
throughout the pathway so while the main

87
00:04:01,010 --> 00:03:58,260
products in this pathway are sharing

88
00:04:03,199 --> 00:04:01,020
glycine and threonine you can still see

89
00:04:07,150 --> 00:04:03,209
some really close metabolic proximity as

90
00:04:09,680 --> 00:04:07,160
and steps of reactions to get to other

91
00:04:12,260 --> 00:04:09,690
amino acids there's tryptophan over here

92
00:04:14,900 --> 00:04:12,270
since a cytosine then methenamine over

93
00:04:20,060 --> 00:04:14,910
here and other amino acid biosynthetic

94
00:04:23,210 --> 00:04:20,070
pathways as well as metabolism so when I

95
00:04:25,540 --> 00:04:23,220
looked at the main products of this

96
00:04:28,640 --> 00:04:25,550
pathway in in terms of these abundance

97
00:04:31,360 --> 00:04:28,650
proxies there's no direct correlation

98
00:04:32,980 --> 00:04:31,370
between the conservation and the

99
00:04:36,910 --> 00:04:32,990
complexity or

100
00:04:40,600 --> 00:04:36,920
of the amino acid which was at first not

101
00:04:43,060 --> 00:04:40,610
what I expected so another path or that

102
00:04:45,520 --> 00:04:43,070
is probably really representative of the

103
00:04:49,390 --> 00:04:45,530
convergence and not entirely surprising

104
00:04:51,790 --> 00:04:49,400
is valine isoleucine and loosing pathway

105
00:04:54,190 --> 00:04:51,800
in which the pathways are identical

106
00:04:57,850 --> 00:04:54,200
until you get to this last step of this

107
00:04:59,950 --> 00:04:57,860
last reaction so that's probably the

108
00:05:05,260 --> 00:04:59,960
best example of convergent pathways and

109
00:05:07,390 --> 00:05:05,270
given these sets so throughout their

110
00:05:09,700 --> 00:05:07,400
data oh man so throughout the data

111
00:05:12,340 --> 00:05:09,710
there's a there were several nodal

112
00:05:15,460 --> 00:05:12,350
proteins or consistently represented

113
00:05:17,200 --> 00:05:15,470
enzymes in all of these pathways or most

114
00:05:19,090 --> 00:05:17,210
of these pathways and that was

115
00:05:22,900 --> 00:05:19,100
depends synthase alpha chain and searing

116
00:05:24,400 --> 00:05:22,910
methyl ease and right so like I said

117
00:05:26,920 --> 00:05:24,410
their presidents everly these super

118
00:05:29,410 --> 00:05:26,930
pathways and they both um convert

119
00:05:30,850 --> 00:05:29,420
different functions to prevent synthase

120
00:05:33,220 --> 00:05:30,860
obviously the last two sets of

121
00:05:35,170 --> 00:05:33,230
tryptophan biosynthesis and searing

122
00:05:37,390 --> 00:05:35,180
methylase catalyzing the searing the

123
00:05:39,160 --> 00:05:37,400
glycine reaction as well as hydrolysis

124
00:05:41,530 --> 00:05:39,170
of tetrahydrofolate which is just the

125
00:05:44,920 --> 00:05:41,540
common cofactor in amino acid metabolism

126
00:05:47,860 --> 00:05:44,930
as well as nucleotide metabolism so I

127
00:05:49,620 --> 00:05:47,870
tried to find of models of enzyme

128
00:05:52,960 --> 00:05:49,630
evolution that would possibly fit the

129
00:05:54,610 --> 00:05:52,970
data I have and the patchwork model

130
00:05:57,220 --> 00:05:54,620
seems to be a really popular model that

131
00:06:00,340 --> 00:05:57,230
may correspond where you start out with

132
00:06:04,110 --> 00:06:00,350
these initially broadly reactive enzymes

133
00:06:08,110 --> 00:06:04,120
like red green and blue pac-man shape

134
00:06:09,280 --> 00:06:08,120
thing and you they do sir favor a

135
00:06:12,280 --> 00:06:09,290
certain reaction but they're still

136
00:06:14,620 --> 00:06:12,290
broadly reactive and after gene

137
00:06:17,440 --> 00:06:14,630
duplication the least thing and

138
00:06:20,140 --> 00:06:17,450
selection from the environment things

139
00:06:21,910 --> 00:06:20,150
become more specific in their function

140
00:06:24,610 --> 00:06:21,920
so I thought maybe this could help

141
00:06:27,540 --> 00:06:24,620
explain or maybe I'm seeing an artifact

142
00:06:30,610 --> 00:06:27,550
of this with these multifunctional nodes

143
00:06:33,360 --> 00:06:30,620
also another popular theory is a semi

144
00:06:35,800 --> 00:06:33,370
enzymatic theory where you have

145
00:06:39,150 --> 00:06:35,810
reactions that become linked some way

146
00:06:41,440 --> 00:06:39,160
and also it incorporates the use of

147
00:06:46,170 --> 00:06:41,450
spontaneous reactions and development of

148
00:06:48,360 --> 00:06:46,180
these processes so I guess my main

149
00:06:50,610 --> 00:06:48,370
inclusions are that you are able to

150
00:06:51,870 --> 00:06:50,620
identify certain functional chords that

151
00:06:56,129 --> 00:06:51,880
are conserved throughout these pathways

152
00:06:57,960 --> 00:06:56,139
and possibly their support for current

153
00:07:01,830 --> 00:06:57,970
models of enzyme evolution with this

154
00:07:04,290 --> 00:07:01,840
data with these data and I have to

155
00:07:06,150 --> 00:07:04,300
acknowledge my pile or landlubber other

156
00:07:08,760 --> 00:07:06,160
members of the lab do which are in the

157
00:07:10,379 --> 00:07:08,770
audience but not in this picture and of

158
00:07:12,060 --> 00:07:10,389
course Aaron Goldman who is responsible

159
00:07:20,999 --> 00:07:12,070
for curating the database that I worked

160
00:07:27,330 --> 00:07:21,009
so closely with so thank have any

161
00:07:32,610 --> 00:07:27,340
questions for Kinner II oh okay well

162
00:07:35,490 --> 00:07:32,620
then I have one um I don't know a whole

163
00:07:37,230 --> 00:07:35,500
lot about where in the cells amino acid

164
00:07:39,450 --> 00:07:37,240
biosynthesis take place if it's all in

165
00:07:42,960 --> 00:07:39,460
one place or if it's spread out is there

166
00:07:44,790 --> 00:07:42,970
any locational tendencies I realize it

167
00:07:47,879 --> 00:07:44,800
might not be a fair question well I'm

168
00:07:49,620 --> 00:07:47,889
I'm not quite sure I don't and this is

169
00:07:53,580 --> 00:07:49,630
probably personal I don't think of it as

170
00:07:59,430 --> 00:07:53,590
a localized reset of reactions but not